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Abstract 

The goal of this paper is to provide a strong integration between constraint modelling 
and relational DBMSs. To this end we propose extensions of standard query languages 
such as relational algebra and SQL, by adding constraint modelling capabilities to them. In 
particular, we propose non-deterministic extensions of both languages, which are specially 
suited for combinatorial problems. Non-determinism is introduced by means of a guessing 
operator, which declares a set of relations to have an arbitrary extension. This new oper- 
ator results in languages with higher expressive power, able to express all problems in the 
complexity class NP. Some syntactical restrictions which make data complexity polyno- 
mial are shown. The effectiveness of both extensions is demonstrated by means of several 
examples. The current implementation, written in Java using local search techniques, is 
described. 

KEYWORDS: Constraint modelling and programming, relational databases, relational 
algebra, SQL, local search. 



1 Introduction 

The efficient solution of NP-hard combinatorial problems, such as resource alloca- 
tion, scheduling, planning, etc. is crucial for many industrial applications, and it 
is often achieved by means of ad-hoc procedural hand- written programs. Declarative 
programming languages like AMPL IjFourer et al. 1998)) and OPL (Va n Hentenryck 1999 ) 
or libraries l|ILOC-98 1998(1 for expressing constraints are commercially available. 
Data encoding the instance are either in text files in an ad-hoc format, or in stan- 
dard relational DBs accessed through libraries callable from programming languages 
such as C++ (cf., e.g., (ILOG-DBLINK 1999)). In other words, there is not a strong 
integration between data definition and constraint modelling and programming lan- 
guages. 

Indeed, such an integration is particularly needed in industrial environments, 



* This paper is an extended and revised version of ICadoli and Mancini 20021 . 



2 



Marco Cadoli and Toni Mancini 



where the necessity for solving combinatorial problems coexists with the presence 
of large databases where data to be processed lie. Hence, constraint solvers that 
operate externally to the databases of the enterprise may lead to a series of disad- 
vantages, first of all a potential lack of the integrity of recorded data. To this end, a 
better coupling between standard data repositories and constraint solving engines 
is highly desiderable. 

The goal of this paper is exactly to integrate constraint modelling and program- 
ming into relational database management systems (R-DBMSs). In particular, we 
show how standard query languages for relational databases can be extended in or- 
der to give them constraint modelling and solving capabilities: with such languages, 
constraint problem specifications can be viewed just like (more complex) queries 
to standard data repositories. In what follows, we propose extensions of standard 
query languages such as relational algebra and SQL, that are able to formulate 
queries defining combinatorial and constraint problems. 

In principle relational algebra can be used as a language for testing constraints. As 
an example, given relations A and B, testing whether all tuples in A are contained in 
B can be done by computing the relation A — B, and then checking its emptiness. 
Anyway, it must be noted that relational algebra is unfeasible as a language for 
expressing NP-hard problems, since it is capable of expressing just a strict subset 
of the polynomial-time queries (cf., e.g., IjAbiteboul et al. 199 5)). As a consequence, 
an extension is needed. 

The proposed generalization of relational algebra is named NP-Alg, and it is 
proven to be capable of expressing all problems in the complexity class NP. We 
focus on NP because this class contains the decisional version of most combinatorial 
problems of industrial relevance ( |Garey and Johnson 1979"| ). NP-Alg is relational 
algebra plus a simple guessing operator, which declares a set of relations to have an 
arbitrary extension. Algebraic expressions are used to express constraints. Several 
interesting properties of NP-Alg are provided: its data complexity is shown to be 
NP-complete, and for each problem £ in NP we prove that there is a fixed query that, 
when evaluated on a database representing the instance of £, solves it. Combined 
complexity is also addressed. 

Since NP-Alg expresses all problems in NP, an interesting question is whether a 
query corresponds to an NP-complete or to a polynomial-time problem. We give 
a partial answer to it, by exhibiting some syntactical restrictions of NP-Alg with 
polynomial-time data complexity. 

In the same way, COnSql (SQL with constraints) is the proposed non-deterministic 
extension of SQL, the well-known language for querying relational databases ( Ull man 1 
having the same expressive power of NP-Alg, and supporting also the specification 
of optimization problems. We believe that writing a COnSql query for the so- 
lution of a combinatorial optimization problem is only moderately more difficult 
than writing SQL queries for a standard database application. The advantage of 
using COnSql is twofold: it is not necessary to learn a completely new language 
or methodology, and integration of the problem solver with the information sys- 
tem of the enterprise can be done very smoothly. The effectiveness of both NP-Alg 
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and COnSql as constraint modelling languages is demonstrated by showing several 
queries which specify combinatorial and optimization problems. 

The structure of the paper is as follows. Syntax and semantics of NP-Alg are 
introduced in Section [5] Some examples of NP-Alg queries for the specification of 
NP-complete combinatorial problems are proposed in Section [3] Main computa- 
tional properties of NP-Alg, including data and combined complexity, expressive 
power, and polynomial fragments, are presented in Section 0] Section [5] contains 
some details of COnSql and its implementation COnSql simulator, as well as 
the specification of some real-world combinatorial and optimization problems. Fi- 
nally, Section contains conclusions as well as references to main related work. 

2 NP-Alg: Syntax and semantics 

We refer to a standard definition of relational algebra with the five operators 
{er, 7T, x,— , U} l|Abiteboul et al. 1995J) . Other operators such as "M" and "/" can 
be defined as usual. Attributes (fields) of relations will be denoted either by their 
names or by their indexes. As an example, given a relation R(a, b), the selection of 
tuples in R with the same values for the two attributes will be denoted in one of the 
following forms: a (R), a (i?) (since there is no confusion to what relation a 

R.a—R.b a—b 

and b refer to), a (R). As for join conditions, they will have atoms of the form 

$1=$2 

a = b or a ^ b where a is an attribute name (or even index) of the relation on 
the left of the join symbol, and b one of that on the right. Finally, temporary rela- 
tions such as T = algexpr{. . .) will be used to make expressions easier to read. As 
usual (cf., e.g., IjChandra and Harel 1980(1 ) queries are defined as mappings which 
are partial recursive and generic, i.e., constants are uninterpreted. 

Let D denote a finite relational database, R the set of its relations, and DOM 
the unary relation representing the set of all constants occurring in D. 

Definition 2.1 [Syntax of NP-Alg) 
An NP-Alg expression has two parts: 

1. A set Q = {Qi Ql) ,...,Qi a " ) } of new relations of arbitrary arity, denoted as 
Guess Qi', . . . , Qn Sets R and Q must be disjoint. 

2. An ordinary expression exp of relational algebra on the new database schema 

[QjR]- 

For simplicity, until Section0]we focus on boolean queries, i.e., queries that admit a 
yes/no answer. For this reason we restrict exp to be a relation which we call FAIL. 

Definition 2.2 {Semantics of NP-Alg) 

The semantics of an NP-Alg expression is as follows: 

1. For each possible extension ext of the relations in Q with elements in DOM, 
the relation FAIL is evaluated, using ordinary rules of relational algebra. 

2. If there exists an extension ext such that the expression for FAIL evaluates 
to the empty relation "0" (denoted as FAIL<>$), the answer to the boolean 
query is "yes" . Otherwise the answer is "no" . 
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When the answer is "yes" , the extension of relations in Q is a solution for the 
problem instance. 



A trivial implementation of the above semantics obviously requires exponential 
time, since there are exponentially many possible extensions of the relations in Q. 
Anyway, as we will show in Section 14.31 some polynomial-time cases indeed exist. 

The reason why we focus on a relation named FAIL is that, typically, it is easy 

to specify a decision problem as a set of constraints (cf. forthcoming Sections El 

andlSJ- As a consequence, an instance of the problem has a solution if and only if 

there is an arbitrary choice of the guessed relations such that all constraints are 

satisfied, i.e., FAIL = 0. A FOUND (1) query can be anyway defined as FOUND = 

DOM — tt(DOM x FAIL). In this case, the answer is "yes" if and only if there is 
$ l 

an extension ext such that FOUND ^ 0. 



3 Examples of NP-Alg queries 

In this section we show the specifications of some NP-complete problems, as queries 
in NP-Alg. All examples are on uninterpreted structures, i.e., on unlabeled directed 
graphs, because we adopt a pure relational algebra with uninterpreted constants. 
As a side-effect, the examples show that, even in this limited setting, we are able 
to emulate bounded integers and ordering. This is very important, because the 
specification of very simple combinatorial problems requires bounded integers and 
ordering. 

In Section[S]we use the full power of COnSql to specify some real-world problems. 



3.1 Graph k-coloring 

We assume a directed graph is represented as a pair of relations NODES^\n) and 
EDGES (2) (from,to) (with tuples in EDGES {2) having components in NODES {1) , 
hence, DOM = NODES). A graph is k-colorable if there is a fc-partition , . . . , 
of its nodes, i.e., a set of k sets such that: 



• Vi e [l,*],Vj e [l,fc],jVi->Q i nQ J = 0, 

• Uti Qi = NODES, 
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and each set Qi has no pair of nodes linked by an edge. The problem is well-known 
to be NP-complete for k > 3 (cf., e.g., ( |Garey and Johnson 1979| )), and it can be 
specified in NP-Alg as follows: 



Guess Qi, ■ ■ ■ , Qfc , 

FAIL.DIS JOINT = (J Q, M Q ; ; 

i^je{i,...,fc} 

k 

FAIL.COVER = NODES A |J Q*; 

8=1 

FAIL.PARTITION = FAIL_DIS JOINT U FAIL.COVER; 
FML_C()L()RING = tt I J [ f tr (Qi x Q*)^ X E/)6*£.s' 

w 1 \ «1#$2 / $l=EDGES.from 



i=l 



$2=EDGES.to 



FAIL = FAIL-PARTITION U FAIL-COLORING. 



(la) 
(lb) 

(1c) 
(Id) 

(le) 
(If) 



Expression (|la|l declares new relations of arity 1. Expression 03 collects all 
constraints a candidate coloring must obey to: 

• Ijlbjl and (flcjl make sure that Qi, . . . ,Qk is a partition of NODES ("A" is 
the symmetric difference operator, i.e., A A B — (A — B) U (B — A), useful 
for testing equality since A A B = <^=> A = B). 

• fie}) checks that each set Qi has no pair of nodes linked by an edge. 

As an example, let k — 3 and the database be as follows: 

NODES EDGES 



from 


to 


1 


2 


1 


4 


2 


3 



An extension of Qi, Qi and Q3 such that FAIL = is: 

Qi Q2 Qz 



Note that such an extension constitutes a solution to the coloring problem. 

We observe that in the specification above the FAIL_PARTITION relation (JTHJ 
makes sure that an extension of Qi, . . . , Q^ is a k-partition of NODES. Such a 
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constraint can be very useful for the specification of problems, so we introduce, as 
syntactic sugar, an expression: 

failPartition { - 1) (N (k \p[ k \ . . . , P^), 

which returns an empty relation if and only if {P^ , . . . , Pn } is a partition of . 
The prefix fail in the name of the expression reminds the user that it should be used 
in checking constraints. We note that the arity of failPartition can, without loss of 
generality, be fixed to 1, since we can always project out the remaining columns. 
Other useful syntactic sugar will be introduced in the following examples, and is 
summarized in Section 



3.2 Independent set 

Let a (directed) graph be defined, as usual, by the two relations NODES^ and 
EDGES {2 \ and let k < \NODES\ be an integer, which is specified by a relation 
containing exactly k tuples. A subset N of NODES, with \N\ > k is said to 
be an independent set of size at least k of the graph if N contains no pair of nodes 
linked by an edge. 

The problem of determining whether an input graph has an independent set of 
size at least k is NP-complete (cf., e.g., ( |Garey and Johnson 19 79)), and it can be 
easily specified in NP-Alg. However, since we have to compare the size of N with 
the integer k (i.e., with the size of relation K), before presenting the NP-Alg query 
for the Independent set problem, we need a method to compare the size of two 
relations N^ and This can be done by deciding whether a proper function 

that maps tuples in N to tuples in K exists. In particular: 

• |iV| = \K\ if and only if there exists a total bijective function between N and 

• \N\ > \K\ if and only if there exists a partial surjective function from Nto K; 

• \N\ < \K\ if and only if there exists a total injective function from N to K. 

To define relational algebra expressions that check whether a relation FUN^ d+r ^ is 
a (total, injective, surjective, or bijective) function from domain to range R^ r \ 
we define the following expressions (for the sake of simplicity, we write definitions 
for d — r = 1, but their extensions to arbitrary d and r are straightforward): 

• failFunction{FUN {2 \ flW) = 

/ \ 

FUN X FUN 
$i=$i 

\ $2=£$2 ) 

where the first and second subexpressions check whether tuples in FUN are 
in the cartesian product D x R, and the third checks whether FUN is mono- 
valued; 

failTotal(FUN {2) , R^) = D - ^(FUN); 



n(FUN) -flU ir(FUN) - R U tt 

$1 / \ $2 / $ 1 
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• failSurjective(FUN {2) ,D^\R^) = R - 7r (FUN); 

• failInjective(FUN {2) , £>(*), flW) = FUN M FUN. 



$i#$i 

A 

$2=$2 



The above expressions evaluate to the empty relation if and only if relation FUN is, 
respectively, a function, a total, surjective, or injective relation (in the mathematical 
sense) from tuples of relation D to tuples of relation R. 

By using the above expressions, we can design new ones, with the goal of com- 
paring the size of two relations D and R: 

• failGeqSize(A UX , D, R) = failFunction(A UX, D, R)UfailSurjective(A UX, D, R); 

• failLeqSize(A UX, D, R) = failGeqSize(A UX, R, D); 

• failEqSize{A UX, D, R) = failLeqSize{A UX, D, R) U failGeqSize(A UX, D, R), 

where A UXis an auxiliary guessed relation that encodes the function between D and 
R. Such auxiliary guessed relations will be omitted as arguments in the remainder 
of the paper if they are not used anywhere else, to enhance readability. 

Returning to the example, the following NP-Alg query specifies the Independent 
set problem: 



The first subexpression of FAIL specifies the constraint \N\ > k, while the second 
one evaluates to the empty relation if and only if no pair of nodes in N is linked by 
an edge. An extension of TV is an independent set (with size at least k) of the input 
graph if and only if the corresponding FAIL relation is empty. 



Given an undirected graph, i.e., the EDGES relation is symmetric, and an integer 
k < \NODES\, a subset TV of NODES, with |7V| > k is said to be a clique of size at 
least k if every pair of distinct nodes of N is linked by an edge (i.e., the subgraph 
induced by N is complete). 

The problem of determining whether a graph has a clique of size at least k is NP- 
complete (cf., e.g., ( |Garey and Johnson 19 79)), and it can be specified in NP-Alg 
as follows (k is encoded as a relation with exactly k tuples): 



Guess iV (1) ; 



FAIL = failGeqSize {1) [N , K) U 



7T (N X N) [XI 

$lL $l=EDGES.from 

A 

$2=EDGES.to 



EDGES . 



3.3 Clique 



Guess N (1) ; 




(N x N) M complement^ (EDGES) 
$i— $i 

A 

$2=$2 
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The structure of the query is very similar to the one of the previous example, except 
for the new expression complement^ k \R^) , which can be defined as 

complement^ (R (k) ) = p (DOM k - R), 
$i— 

$k^'R.$k 

and returns the active complement of the relation given as argument (p is the 
field-renaming operator, used to name all columns of the output relation like those 
of R). Obviously the above query can be written in several other ways. As an 
example, a more efficient one would use the difference operator, instead of the 
join; notwithstanding this, we have chosen the above query to show the use of 
complement. 

3-4 More examples 

We can specify in NP-Alg other famous problems over graphs like Dominating set, 
Transitive closure, and Hamiltonian path. It is worth noting that Transitive clo- 
sure, indeed a polynomial-time problem, is not expressible in relational algebra 
(cf., e.g., IjAbiteboul et al. 1995JI ). because it intrinsically requires a form of recur- 
sion (cf. Sectional). In NP-Alg recursion can be simulated by means of guessing. As 
for Hamiltonian path, this is the problem of finding a traversal of a graph which 
touches each node exactly once. The possibility to specify the Hamiltonian path 
problem in NP-Alg has interesting consequences which deserve some comments. 
Consider a unary relation DOM, with |Z)OM| = M ^ and the complete graph 
C defined by the relations NODES = DOM and EDGES = DOM x DOM. A 
Hamiltonian path H of C is a total ordering of the M elements in DOM: in fact it 
is a successor relation. The transitive closure of H is the corresponding less-than 
relation. As a consequence, we have the possibility to use bounded integers in the 
range [1, M] in our framework, and also arithmetic operations on them. 

Furthermore, the Hamiltonian paths of C correspond to the permutations of 
[1,M]. Permutations are very useful for the specification of several problems. As 
an example, in the n- queens problem (in which the goal is to place n non-attacking 
queens on an n x n chessboard) a candidate solution is a permutation of order n, 
representing the assignment of a pair (row, column) to each queen. Interestingly, 
to check the attacks of queens on diagonals, in NP-Alg we can guess a relation 
encoding the subtraction of elements in DOM. 

Other interesting problems, not involving graphs, can be specified in NP-Alg: 
Satisfiability of a propositional formula and Evenness of the cardinality of a relation 
are some examples. 

3.5 Useful syntactic sugar 

Previous examples show that guessing relations as subsets of DOM k (for integer k) 
is enough to express many NP-complete problems. The forthcoming Theorem 14.31 
shows that this is indeed enough to express all problems in NP. 
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Nevertheless, expressions such as failPartition can make queries more readable. 
In this section we briefly summarize the main expressions that we designed. 

• empty ^(RW) = DOM - tt(DOM x #(*)), evaluates to the empty relation 

$ 1 

if R is a non-empty one (and vice versa). 

• complement 1 -^ (R^) evaluates to the active complement (with respect to 
DOM k ) of R (cf. Section IO. 

• failPartition {1) (N^, P [k) , P [k) ) (cf. Section HO) evaluates to the empty 
relation if and only if {P[ k \ . . . , P^} is a partition of N. 

• failSuccessor^ (SUCC^ 2k \ N^) evaluates to the empty relation if and only 
if SUCC encodes a correct successor relation on elements in AT, i.e., a 1- 
1 correspondence with the interval [1, |JV|] (essentially by checking whether 
SUCC is a Hamiltonian path on the graph with edges defined by N x N). 

• failPermutation^\PERM^ 2k \ N^) evaluates to the empty relation if and 
only if PERM is a permutation of the elements in N^ k \ The ordering sequence 
is given by the first k columns of PERM. 

• failFunction {1) {FUN {d+r \D^ d \R^), failTotal (1) (FUN (d+r) , D^ d \ 
faillnjective (1) {FUN {d+r \ D^ d \ failSurjective (1) (FUN {d+r \ D^ d \ #W) 
(cf. Section 13. 2|) evaluate to the empty relation if and only if FUN is, 
respectively, a function, a total, injective or surjective relation from tuples 
in D to those in R. We remark that, since elements in R can be ordered 
(cf. Section 13.4(1 . FUN is also an integer function from elements of D to 
the interval [1, \R\]. Integer functions are very useful for the specification of 
resource allocation problems, such as Integer knapsack (see also examples in 
Section 15. 2|) . 

• failEqSize {1) {N,K), failGeqSize {1) ' {N ", K), failLeqSize (1) (N , K) (cf. Sec- 
tion EO| evaluate to the empty relation if and only if \N\ is, respectively, 
=,>,<\K\. 

4 Computational aspects of NP-Alg 

In this section we focus on the main computational aspects of NP-Alg: data and 
combined complexity, expressive power, and polynomial fragments. 

Technically, the results presented in this section can be easily obtained from 
corresponding ones formulated for other languages, e.g., existential second order 
logic (ESO). Nevertheless, we believe that when designing a constraint modelling 
language it is of fundamental importance, from the methodological point of view, 
to ascertain its main computational properties. 

4- 1 Data and combined complexity 

The data complexity, i.e., the complexity of query answering assuming the database 
as input and a fixed query (cf. IjAbiteboul et al. 1995|l ). is one of the most important 
computational aspects of a language, since queries are typically small compared to 
the database. 
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Since we can express some NP-complete problems in NP-Alg (cf. Section the 
problem of deciding whether FAILOtf) is NP-hard. Moreover we can prove that the 
data complexity for such a problem is in NP by using the following argument. It is 
possible to generate, in non-deterministic polynomial time, an extension ext of Q. 
The answer is "yes" if and only if there is such an ext that makes FAIL = 0. The 
last check, being the evaluation of an ordinary relational algebra expression, can be 
done in polynomial time in the size of the database. The above considerations give 
us the first computational result on NP-Alg. 

Theorem 4-1 

The data complexity of deciding whether FAILO$ for an NP-Alg query, where the 
input is the database, is NP-complete. 

Another interesting measure is combined complexity, where both the database and 
the query are part of the input. It is well known that, typically, the combined com- 
plexity of a language is much higher than its data complexity IjVardi 1982|l . As for 
NP-Alg, it is possible to show that, when both the database and the query are part 
of the input, the problem of determining whether FAILO$ is hard for the complex- 
ity class NE, defined as \J C>1 NTIME (2 cn ) (cf. < |Papadimitriou 1994) ), i.e., the 
class of all problems solvable by a non-deterministic machine in time bounded by 
2 cn , where n is the size of the input and c is an arbitrary constant. 

Theorem 4-2 

The combined complexity of deciding whether FAIL<>fy for an NP-Alg query, where 
the input is both the database and the query, is NE-hard. 

The proof is quite long and is delayed to |Appendix A| 

4- 2 Expressive power 

The expressiveness of a query language characterizes the problems that can be 
expressed as fixed, i.e., instance independent, queries. In this section we prove the 
main result about the expressiveness of NP-Alg, by showing that it captures exactly 
NP, or equivalently (cf. (Fa gin 1974| )) queries in the existential fragment of second- 
order logic (ESO). 

Of course it is very important to be assured that we can express all problems 
in the complexity class NP. In fact, Theorem 14. II savs that we are able to express 
some problems in NP. We remind that the expressive power of a language is less 
than or equal to its data complexity. In other words, there exist languages whose 
data complexity is hard for class C in which not every query in C can be expressed; 
several such languages are known, cf., e.g., IjAbiteboul et al. 19 95). 

In order to show that NP-Alg is able to express all problems in NP, we illustrate 
a method that transforms an arbitrary formula in ESO into a NP-Alg query. We 
remind that, by Fagin's theorem ( |Fagin 19 74), any collection D of finite databases 
over R is NP-recognizable if and only if it can be defined by a existential second 
order formula. In particular, we deal with ESO formulae of the following kind: 



(3S) (VX) (3Y) v(X,Y), 



(2) 
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where ip is a first-order formula (without quantifiers) containing variables among 
X, Y and involving relational symbols in S U R U {=}. The reason why we can 
restrict our attention to second-order formulae in the above normal form is ex- 
plained in ( |Kolaitis and Papadimitriou 199l| ). As usual, "=" is always interpreted 
as "identity". 

The transformation works in two steps: 

1. The first-order formula <^(X, Y) obtained by eliminating all quantifiers from J5J 
is translated into an expression PHI of plain relational algebra; 

2. The query ip is defined as: 

Guess Q[ ai \...,Q^- 

FAIL = DOM |x| - tt(PHI), 
x 

where di, . . . ,a n are the arities of the n predicates in S, and |X| is the number of 
variables occurring in X. 

The first step is rather standard (cf., e.g., l|Abiteboul et al. 1995JI ). and is briefly 
sketched here just to give the intuition. A relation R (with the same arity) is 
introduced for each predicate symbol r in the relational vocabulary of if, i.e., RUS. 
An atomic formula of first-order logic is translated as the corresponding relation, 
possibly prefixed by a selection that accounts for constant symbols and/or repeated 
variables, and by a renaming of attributes mapping the arguments. Selection can 
be used also for dealing with atoms involving equality. Inductively, the relation 
corresponding to a complex first-order formula is built as follows: 

• / A g translates into F XI G, where F and G are the translations of / and g, 
respectively; 

• /Vfl 1 translates into F'LiG', where F 1 and G' are derived from the translations 
F and G to account for the (possibly) different schemata of / and g; 

• ->/(Z) translates into p (Z?0M ' z ' — F) (p is the column renaming 

$1— >F.$1 
$|Z|-»>.$|Z| 

operator, needed to name columns of (DOM m - F) like those of F). 

It is worth noting that a better translation avoids the insertion of occurrences of the 
DOM relation for the important class of safe formulae (cf., e.g., l|Abiteboul et al. 1 995)'). 
However, these issues are out of the scope of this paper, and will not be taken into 
account. 

Relations obtained through such a translation will be called q-free, because they 
do not contain the projection operator (that plays the role of an existential quan- 
tification), except those implicit in equi-joins. Intuitively, this means that there are 
no existential quantifiers. 

The following theorem claims that the above translation is correct. 

Theorem 4-3 

For any NP-recognizable collection D of finite databases over R -characterized by 
a formula of the kind Q- a database D is in D, i.e., D \= (3S)(VX)(3Y) p(X, Y), 
if and only if FAIL<>%, when ip (cf. formula (0)) is evaluated on D. 
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Proof 

(Only if part) If D € D, it follows that an extension X for predicates in S exists, 
such that: 

[D,H] h(VX)(3Y) v>(X,Y). 

By translating <p on the right side into relational algebra (according to the point 
above), we obtain a relational expression PHI on the relational vocabulary given by 
relations corresponding to predicates in R, plus those corresponding to predicates 
in S (i.e., relations in Q). 

When evaluating PHI on the new database \D, 3], where H are the extensions of 
relations in Q corresponding to the extensions of predicates in E, we obtain that 
for all tuples (X) there exists a tuple (Y) such that the tuple (X, Y) belongs to 
PHI, i.e.: 

V(X)3(Y) : (X,Y) e PHI. 
Since (X) e L>OM |x| , we obtain that £>OM |x| C tt(PHI), implying that the ex- 
pression for FAIL in the NP-Alg query © evaluates to the empty relation for the 
extension H of the guessed tables Q. 

(If part) Suppose that D D. This implies that D \= -.(3S)(VX)(3Y)^(X, Y) or, 
equivalently, that: 

D h (VS)(3X)(VY) ^(X,Y) 

By translating formula ip into relational algebra, we obtain that, for every extension 
H of relations in Q (corresponding to predicates in S), there exists at least one tuple 
(X) such that for every tuple (Y), tuple (X, Y) does not belong to PHI. This implies 
that: 

3(X) e DOM |x| : (X) £ ir(PHI), 

and so that the expression for FAIL in the NP-Alg query does not evaluate to 
the empty relation for all possible extensions S of the guessed tables Q. □ 



4-3 Polynomial fragments 

Polynomial fragments of second-order logic have been presented in, e.g., i|Gottlob et al. 2 004). 
In this section we use some of those results to show that it is possible to isolate 
polynomial fragments of NP-Alg. 

Theorem 4-4 

Let s be a positive integer, PHI a g-free expression of relational algebra over the 
relational vocabulary edb(D) U {Q^}, and Yi,Y 2 the names of two attributes of 
PHI. An NP-Alg query of the form: 



Guess Q (s) ; 
FAIL = ( 

can be evaluated in polynomial time in the size of the database. 



FAIL = (DOM x DOM) - ^ (PHI) 
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Proof 

This class of NP-Alg queries corresponds to the Eaa prefix class of second-order logic 
described in l|Gottlob et al. 2004J1 , which is proved to be polynomial by a mapping 
into instances of 2SAT. The correctness of the translation is formally guaranteed 
by Theorem O □ 

Some interesting queries obeying the above restriction can indeed be formulated. 
As an example, 2-coloring can be specified as follows (when k = 2, /c-coloring, 
cf. Section im becomes polynomial): 

Guess C (1) ; 

complement(EDGES) U 



FAIL = DOM x DOM 



C x complement(C) U complement(C) x C 



C and its complement denote the 2-partition. The constraint states that each edge 
must go from one subset to the other one. 

Another polynomial problem of this class is 2-partition into cliques (cf., e.g., 
( |Garey and Johnson 19791 )), which amounts to decide whether there is a 2-partition 
of the nodes of a graph such that the two induced subgraphs are complete. An NP- 
Alg query which specifies the problem is: 

Guess P (1) ; 

FAIL = DOM x DOM- 

[complement (P) x P U P x complement(P) U EDGES] . 

A second polynomial class is defined by the following theorem. 

Theorem 4-5 

Let PHI(Xi, . . . , Xfc, Yi, Y2) (k > 0) be a g-free expression of relational algebra 
over the relational vocabulary edb(D) U {Q^}- An NP-Alg query of the form: 

Guess Q (1) ; 

X(X 1 ,...,X k ) = PHI{X 1 ,...,X k ,Y u Y 2 ) I p (DOM x DOM); 

FAIL = empty(X). 
can be evaluated in polynomial time in the size of the database. 
Proof 

This class of NP-Alg queries corresponds to the E\e*aa prefix class of second-order 
logic HGottlob et al. 2004|) . which, in turn, is proved to be polynomial by a mapping 
into 2SAT. Also in this case, the correctness of the translation is formally guaranteed 
by Theorem EH □ 

As an example, the specification for the Graph dis connectivity problem, i.e., to 
check whether a graph is not connected, belongs to this class. 
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5 The COnSql language 

In this section we describe the COnSql language, a non-deterministic extension of 
SQL (able to express also optimization problems) whose optimization-free subset 
has the same expressive power as NP-Alg, and present some specifications written 
in this language. 

5. 1 Syntax of conSql 

COnSql is a strict superset of SQL. The problem instance is described as a set 
of ordinary tables, using the data definition language of SQL. The novel construct 
CREATE SPECIFICATION is used to define a problem specification. It has three parts, 
two of which correspond to the parts of Definition 12.11 

1. Definition of the guessed tables, by means of the new keyword GUESS; 

2. Optional definition of an objective function, by means of one of the two key- 
words MAXIMIZE and MINIMIZE; 

3. Specification of the constraints that must be satisfied by guessed tables, by 
means of the standard SQL keyword CHECK. 

Furthermore, the user can specify the desired output by means of the new keyword 
RETURN. In particular, the output is computed when an extension of the guessed 
tables satisfying all constraints and such that the objective function is optimized 
is found. Of course, it is possible to specify many guessed tables, constraints and 
returned tables. The syntax is as follows (we write it in BNF, with terminals either 
capitalized or quoted, and, for every terminal or non-terminal a, "[a]" meaning 
optionality, "a*" a list of an arbitrary number of a, and "a+" meaning "o(o*)"): 

CREATE SPECIFICATION problem_name 1 ( ' 

(GUESS TABLE table_name ['('aliases')'] AS guessed_table_spec)+ 

( (MAXIMIZE I MINIMIZE) ' ( ' aggregate_query ' ) ' 

(CHECK '(' condition ')')+ 

(RETURN TABLE return_table_name AS query)* 

')' 

The guessed table table_name gets its schema from its definition guessecLta- 
ble_spec. The latter expression is similar to a standard SELECT-FRDM-WHERE SQL 
query, except for the FROM clause that can contain also expressions such as: 

SUBSET OF SQL_from_clause I 

[TOTAL I PARTIAL] FUNCTI0N_T0 ' ( ' (range_table I min ' . . ' max) ' ) ' 

AS f ield_name_list OF SQL_f rom_clause I 
(PARTITION '(' n ')' I PERMUTATION) AS field_name OF SQL_f rom_clause 

with SQL_f rom_clause being the content of an ordinary SQL FROM clause (e.g., 
a list of tables). The schema of such expressions consists in the attributes of 
SQL_f rom_clause, plus the extra f ield_name (or f ield_name_list), if present. 
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In the FROM clause the user is supposed to specify the shape of the search space, 
either as a plain subset (like in NP-Alg), or as a mapping (i.e., partition, permuta- 
tion, or function) from the domain defined by SQL_f rom_clause. Mappings require 
the specification of the range and the name of the extra field(s) containing range 
values. As for PERMUTATION, the range is implicitly defined to be a subset of inte- 
gers. As for FUNCTI0N_T0 the range can be either an interval min. .max of a SQL 
enumerable type, (e.g., integers) or the set of values of the primary key of a table 
denoted by range_table. The optional keyword PARTIAL means that the function 
can be defined over a subset of the domain (the default is TOTAL). We remind the 
reader that using partitions, permutations or functions does not add any expressive 
power to the language (cf. Section ITTTH|> . 

As for the objective function, the user is supposed to specify a query whose 
output is a monadic table with only one tuple of an SQL totally ordered type (e.g., 
integers or reals), typically by making use of SQL aggregate operators like COUNT, 
SUM, etc. 

It is possible to specify constraints on the guessed tables by using ordinary SQL 
boolean conditions, e.g., EXISTS, NOT EXISTS, IN, NOT IN, =ANY, =ALL, etc. 

Finally, the query that defines a returned table is an ordinary SQL query on the 
tables defining the problem instance plus the guessed ones, and it is evaluated for 
an arbitrary extension of the guessed tables encoding an optimal solution. This is 
consistent with the semantics adopted by all state-of-the-art systems for Constraint 
Programming. 

Once a problem has been specified, its solution can be obtained with an ordinary 
SQL query on the return tables: 

SELECT f ield_name_list 

FROM problem_name . return_table_name 

WHERE condition 

The table ANSWER (n INTEGER) is implicitly defined locally to the CREATE SPECIF- 
ICATION construct, and it is empty if and only if the problem has no solution. 

5.2 Examples 

In this subsection we exhibit the specification of some problems in COnSql. In 
particular, to highlight its similarity with NP-Alg, we show the specification of the 
graph coloring problem of of Section 13.11 Afterwards, we exploit the full power of 
the language and show how some real-world problems can be easily specified. In 
all the examples, we describe the schema of the input database, and underline key 
fields. 

5.2.1 Graph k-coloring 

We assume an input database over the schema shown in the Entity-Relationship 
(ER) diagram in Figure ^ thus containing relations NODES (n), EDGES (f,t) (en- 
coding the graph), and COLORS '(id,name) (listing the k colors). Once a database 



16 



Marco Cadoli and Toni Mancini 




Fig. 1. ER diagram of the database schema for the /c-coloring problem. The guessed 
table COLORING is in boldface. 

(i.e., a problem instance) has been created (by using standard SQL commands), a 
COnSql specification of the fc-coloring problem is the following: 

CREATE SPECIFICATION Graph_Coloring ( 

/* COLORING contains tuples of the kind <N0DES.n, COLORS. id>, 
with COLORS. id arbitrarily chosen. */ 
GUESS TABLE COLORING AS 

SELECT n, color FROM TOTAL FUNCTI0N_T0 (COLORS) AS color OF NODES 
CHECK ( NOT EXISTS ( 

SELECT * FROM COLORING CI, COLORING C2, EDGES 
WHERE Cl.n <> C2.n AND CI. color = C2. color 
AND Cl.n = EDGES. f AND C2.n = EDGES. t )) 
RETURN TABLE SOLUTION AS SELECT COLORING. n, COLORS. name 
FROM COLORING, COLORS WHERE COLORING. color = COLORS. id 

) 

The GUESS part of the problem specification defines a new (binary) table COLO- 
RING, with fields n and color, as a total function from the set of NODES to the set 
of COLORS. The CHECK statement expresses the constraint an extension of COLORING 
table must satisfy to be a solution to the problem, i.e., there are no two distinct 
nodes linked by an edge which are assigned the same color. 

The RETURN statement defines the output of the problem by a query that is 
evaluated for an extension of the guessed table that satisfies every constraint. The 
user can ask for such a solution with the statement 



SELECT * FROM Graph_Coloring. SOLUTION 

As described in the previous subsection, if no coloring exists, the system table 
Graph_Coloring. ANSWER will contain no tuples. This can be easily checked by the 
user, in order to obtain only a significant Graph_Coloring . SOLUTION table. 
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Fig. 2. ER diagram of the database schema for the University course 
timetabling problem. The guessed table TIMETABLE is in boldface. 



5.2.2 University course timetabling 

The University course timetabling problem (Schaerf 1999) consists in finding the 
weekly scheduling for all the lectures of a set of university courses in a given set 
of classrooms. We consider a variant of the original problem in which the objective 
function to minimize is the total number of students that have to attend overlapping 
lectures. 

The input database schema is shown in Figure and consists of the following 
relations: 

• COURSE(id, numJectures, num_students), consisting of tuples (c, I, s) mean- 
ing that the course c needs I lectures a week, and has s enrolled students. 

• PERIOD(id, start, finish) encoding (non-overlapping) periods, plus informa- 
tion on start and finish time. 

• ROOM(id, capacity). A tuple (r, c) means that room r has capacity c. 

• CONFLICT Y coursel . course2 . numstudents) . A tuple (cl , c2,n) means that 
courses cl and c2 have n common students. 

• UN AVAIL (course , period). A tuple (c,p) means that the teacher of course c 
is not available for teaching at period p. 

A solution to the problem is a (guessed) relation TIMETABLE (period, room, course) 
with tuples {p, r, c) meaning that at period p in room r there is a lecture of course 
c. If for some values of the room and period fields there is no tuple in the relation 
TIMETABLE, then the room is unused in that period. 

A COnSql specification of the timetabling problem, given an input database, is 
the following: 

CREATE SPECIFICATION University_Timetabling ( 
GUESS TABLE TIMETABLE (period, room, course) AS 
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SELECT p. id, r.id, course 

FROM PARTIAL FUNCTI0N_T0 (COURSE) AS course OF PERIOD p, ROOM r 

// Objective function 

MINIMIZE ( SELECT SUM(c .num_ students) 

FROM TIMETABLE tl, TIMETABLE t2, CONFLICT c 
WHERE tl. period = t2. period AND tl. course <> t2. course AND 
c.coursel = tl. course AND c.course2 = t2. course 

) 

// At most one lecture of a course per period 
CHECK ( NOT EXISTS ( 

SELECT * FROM TIMETABLE tl, TIMETABLE t2 

WHERE tl. course = t2. course AND 

tl. period = t2. period AND tl.room <> t2.room 

)) 

// Unavailability constraints 
CHECK ( NOT EXISTS ( 

SELECT * FROM TIMETABLE t, UNA VAIL u 

WHERE t. course = u. course AND t. period = u. period 

)) 

// Capacity constraints 
CHECK ( NOT EXISTS ( 

SELECT * FROM TIMETABLE t, COURSE c, ROOM r 

WHERE t. course = c . id AND t.room = r.id AND 
c .num_students > r. capacity 

)) 

// Teaching requirements 
CHECK ( NOT EXISTS ( 
SELECT * FROM COURSE c 
WHERE c .num_lectures <> 

( SELECT COUNTO) FROM TIMETABLE t 
WHERE t. course = c.id 

) 

)) 

RETURN TABLE SOLUTION AS SELECT * FROM TIMETABLE 

) 

In particular, the constraints force extensions of the guessed table TIMETABLE to 
be such that: 

• There is at most one lecture of a course per period, i.e., there cannot be two 
different rooms allocated for the same course in the same time slot; 

• Unavailability constraints are respected, i.e., no lecture is scheduled in a pe- 
riod for which the relevant teacher is unavailable; 

• Capacity constraints are respected, i.e. , no room is allocated for courses having 
a number of students that exceeds its capacity; 

• Teaching requirements are satisfied, i.e., all courses have a room and a time 
slot assigned for all the lectures they need. 

An extension for guessed table TIMETABLE that satisfies the constraints above 
is an optimal solution to the University course timetabling problem if it minimizes 
the overall number of students that are expected to attend conflicting lectures, i.e., 
lectures that are scheduled at the same time. 
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Fig. 3. ER diagram of the database schema for the Aircraft landing problem. The 
guessed table LANDING is in boldface. 



5.2.3 Aircraft landing 

The aircraft landing problem jBeasley et al. 2000| | consists in scheduling landing 
times for aircraft. Upon entering within the radar range of the air traffic control 
(ATC) at an airport, a plane requires a landing time and a runway on which to land. 
The landing time must lie within a specified time window, bounded by an earliest 
time and a latest time, depending on the kind of the aircraft. Each plane has a 
most economical, preferred speed. A plane is said to be assigned its target time, if 
it is required to fly in to land at its preferred speed. If ATC requires the plane to 
either slow down or speed up, a cost incurs. The bigger the difference between the 
assigned landing time and the target landing time, the bigger the cost. Moreover, 
the amount of time between two landings must be greater than a specified minimum 
(the separation time) that depends on the planes involved. Separation times depend 
on the aircraft landing on the same or different runways (in the latter case they are 
smaller). 

Our objective is to find a landing time for each planned aircraft, encoded in a 
guessed relation LANDING ', satisfying all the previous constraints, and such that 
the total cost (i.e., the sum of the costs associated with each aircraft) is minimized. 
The input database schema is shown in Figure |3 and consists of the following 
relations: 

• AIRCRAFT (id , target .time, earliest -time, latest.time, bef .cost, aft.cost), list- 
ing aircraft planned to land, together with their target times and landing time 
windows; the cost associated with a delayed or advanced landing at time x 
is given by bef .cost ■ max[0, t — x] + aft.cost ■ max[0, x — t], where t is the 
aircraft target time. 

• RUNWAY (id) listing all the runways of the airport. 

• SEPARATION (first, second , intsamejrw, int.diff _rw). A tuple (a, a! , is, id) 



20 



Marco Cadoli and Toni Mancini 



means that if aircraft a' lands after aircraft a, then landing times must be 
separated by is (resp. id) minutes if they land on the same runway (resp. on 
different runways). 

In the following specification, the search space is a total function assigning an 
aircraft to a landing time and a runway. For the sake of simplicity landing times 
are expressed in minutes after a conventional time instant, e.g., the scheduling 
starting time, and the time horizon is set to one day, i.e., 24 x 60 minutes. 

CREATE SPECIFICATION Aircraf t_Landing ( 
GUESS TABLE LANDING AS 

SELECT ar.id AS aircraft, ar. runway, at. time 

FROM (TOTAL FUNCTI0N_T0 (RUNWAY) AS runway OF AIRCRAFT) ar, 

(TOTAL FUNCTI0N_T0(0. .24*60-1) AS time OF AIRCRAFT) at 
WHERE ar.id = at . id 
// Objective function 
MINIMIZE ( SELECT SUM(cost) 
FROM ( 

SELECT a. id, (a.bef_cost * (a.target_time - l.time)) AS cost 
FROM AIRCRAFT a, LANDING 1 

WHERE a. id = 1. aircraft AND l.time <= a.target_time 
UNION // advanced plus delayed aircraft 

SELECT a. id, (a.aft_cost * (l.time - a.target_time)) AS cost 
FROM AIRCRAFT a, LANDING 1 

WHERE a. id = 1. aircraft AND l.time > a.target_time 
) AIRCRAFT_COST // Contains tuples <aircraf t , cost> 

) 

// Time window constraints 
CHECK ( NOT EXISTS ( 

SELECT * FROM LANDING 1, AIRCRAFT a WHERE 1. air craft = a. id 
AND ( l.time > a. latest_time OR l.time < a. earliest_time ) 

)) 

// Separation constraints 
CHECK ( NOT EXISTS ( 

SELECT * FROM LANDING 11, LANDING 12, SEPARATION sep 
WHERE 11. aircraft <> 12. aircraft AND 11. time <= 12. time AND 
sep. first = 11. aircraft AND sep. second = 12 . aircraf t AND 
( ( (11. runway = 12. runway) AND 

(12. time - 11. time) < sep . int_same_rw ) OR 
( (11. runway <> 12. runway) AND 

(12. time - 11. time) < sep . int_dif f _rw ) 

))) 

RETURN TABLE SOLUTION AS SELECT * FROM LANDING 

) 

In particular, the constraints force extensions of the guessed table LANDING to 
be such to respect both time window constraints (i.e., the actual landing time for 
each aircraft must lie inside its landing time window), and separation constraints 
(encoded in the SEPARATION relation). Such an extension is an optimal solution 
to the Aircraft landing problem if it minimizes the overall cost. 
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Fig. 4. conSql simulator overall architecture 



5.3 conSql simulator 

conSql simulator is an application that works as an interface to a traditional 
R-DBMS. It simulates the behavior of a conSql server by reading from its in- 
put stream conSql queries, i.e., ordinary SQL queries and commands, and problem 
specifications. Ordinary SQL queries and commands are simply passed to the under- 
lying R-DBMS, while problem specifications are processed. The overall architecture 
of the system is depicted in Figure In particular, CREATE SPECIFICATION con- 
structs are parsed, creating the new tables (corresponding to the guessed ones) and 
an internal representation of the search space. The search space is then explored by 
the solver, looking for an element corresponding to an optimal solution, by posing 
appropriate queries to the R-DBMS (in standard SQL). As soon as an optimal solu- 
tion is found, results of the queries specified in the RETURN statements are accessible 
to the user as output. 

The implementation of conSql simulator gives much attention to software 
engineering aspects and to different quality factors of software artifacts. In partic- 
ular, the system is platform independent and highly portable, since it is written in 
Java, and uses the standard JDBC protocol for the connection with the R-DBMS, 
and the whole architecture presents a neat separation among the language parser, 
the problem modelling module and the solver engine (JLocal), so as to empha- 
size qualities such as modularity, extendability and reusability. In particular, the 
problem modelling module allows to represent problem specifications in a language 
independent fashion, relying on abstract concepts such as Problem, Search space, 
Objective function, and Constraint, as the conceptual UML diagram in Figure El 
shows. In this way, the solving engine JLocal, interacting with the abstract prob- 
lem modelling module, is independent of the particular language, i.e., conSql. The 
only language-dependent part of the system is the parser for the conSql language, 
that provides concrete implementations for the abstract concepts that compose the 
problem modelling module, building the internal representation of the problem in- 
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Fig. 5. Portion of the conceptual UML class diagram for the language-independent 
problem modelling module. 

stance, the search space, and the constraints, and for some of the services needed 
by the solver. 

As for the search methods, the solver engine JLOCAL exploits local search tech- 
niques to find solutions. Local search is considered one of the most attractive tech- 
niques for solving combinatorial optimization problems (cf., e.g., IjAarts and Lenstra 1997^ . 
being able to solve instances of realistic size in reasonable time. Several local search 
algorithms have been implemented in JLOCAL, among them Hill climbing and 
Tabu search ( |Glover and Laguna 1997| ), and several strategies that combine dif- 
ferent solvers for doing the search are present (e.g., Tandem search, in which two 
different solvers are used in sequence). Additional local search strategies can be 
simply added by subclassing the LocalSearchSolver class (not described here for 
the sake of simplicity) . 

It is worth noting that the user is completely unaware of the search techniques 
implemented by the system. In particular, definitions for neighborhoods, moves, 
aspiration functions, etc., are made by COnSql simulator itself, starting from 
the types of the guessed tables defined in the specification (i.e., subsets, functions, 
permutations, partitions). 

The development of COnSql simulator has been done according to the itera- 
tive model of the software life-cycle. In particular, three iterations were expected. 
The first, prototypical, version of the system, which was used only to test specifi- 
cations, relied on a purely enumerative approach, and, of course, no considerations 
on performances could be done. As for the present version, which is at the second 
iteration of the development process, we added the local search engine, but the 
connection with the DBMS is yet completely black box. In particular, JLOCAL uses 
the DBMS both for maintaining the current state, for checking constraints, and for 
evaluating which neighbor to visit next. The main motivation behind this choice, 
is that current DBMSs offer means to answer queries efficiently, especially in case 
of very large instances. Nonetheless, since constraints are evaluated from scratch in 
every visited state, performances cannot be good, and only instances of small sizes 
can be actually solved. 

In the third version of the system, which is currently under development, we are 
adding the following functionalities: 

1. The ability of checking constraints incrementally: constraints' check is the 
main source of inefficiency of the current version, since they are evaluated 
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from scratch for every visited state, and for all its neighbors, in order to 
choose the best move. Hence, the number of queries posted to the DBMS 
is very high, and all of them are answered independently from each other. 
However, it is clear that, when using local search technology, only a small 
variation in the number of constraint violations is expected, when moving 
from one state to its neighbors. To this end, our goal is to make the DBMS 
able to compute only variations to constraints' violations when performing 
checks. This is expected to greatly increase the overall performances of the 
system, since we can rely on very sophisticated algorithms to, e.g., maintain 
and synchronize views, actually present in currently available DBMSs. 

2. The use of a much more complex local search engine. In particular, we are 
currently integrating EasyLocal++ ( |Di Gaspero and Schaerf 20031 ) > a verv so- 
phisticated solver, with our system. This can lead to better algorithms, and 
to a fine tuning of their parameters. 

3. The addition of an optional "search" part in CREATE SPECIFICATION con- 
structs, as it happens in, e.g., OPL, in order to provide the user with the pos- 
sibility of declaring which search algorithm to adopt, as well as the types of 
neighborhoods, moves, aspiration functions, etc. Of course, as already claimed, 
our goal is to provide good defaults for the options in this part, as, e.g., OPL 
does, by letting the system able to automatically make a good choice for these 
issues, depending on the specification at hand. 

4. To provide a better coupling with a particular open-source DBMS, in order to 
make the system able to directly use the DBMS' APIs, instead of interacting 
by means of (inefficient, but highly portable) protocols like JDBC. 

6 Conclusions, related and future work 

In this paper we have tackled the issue of strong integration between constraint 
modelling and programming and up-to-date technology for storing data. In partic- 
ular we have proposed constraint languages which have the ability to interact with 
data repositories in a standard way. To this end, we have presented NP-Alg, an 
extension of relational algebra which is specially suited for combinatorial problems. 
The main feature of NP-Alg is the possibility of specifying, via a form of non- 
determinism, a set of relations that can have an arbitrary extension. This allows 
the specification of a search space suitable for the solution of combinatorial prob- 
lems, with ordinary relational algebra expressions defining constraints. Although 
NP-Alg provides just a very simple guessing operator, many useful search spaces, 
e.g., permutations and functions, can be defined as syntactic sugar. 

Several computational properties of NP-Alg have been shown, including data 
and combined complexity, and expressive power. Notably, the language is shown 
to capture exactly all the problems in the complexity class NP, which includes 
many combinatorial problems of industrial relevance. In the same way, we have 
proposed COnSql, a non-deterministic extension of SQL, with the same expressive 
power of NP-Alg, which is suitable also for specifying optimization problems. The 
effectiveness of NP-Alg and COnSql both as complex query and constraint mod- 
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clling languages has been demonstrated by showing several queries which specify 
combinatorial problems. 

Other extensions of relational algebra have already been proposed. The most 
important examples are the languages Alg + while and Alg + while + , where, respec- 
tively, a non-inflationary and an inflationary fixpoint semantics is added IjAbiteboul et al~f 995 ) 
Both these languages are capable of expressing the Transitive Closure query, but 
have very different expressive power: Alg + while can express queries in PSPACE, 
but the language captures exactly this class only on ordered databases (i.e., databases 
in which a total order among all constants occurring in it is fixed). As for Alg + while + 
instead, it can express only polynomial-time queries, and the language captures 
the whole PT1ME class only on ordered databases. A feature for expressing linear 
recursion has recently been added also to SQL (sql'99), by means of the WITH con- 
struct. However, both the aforementioned extensions of relational algebra, and the 
new version of SQL do not make such languages suitable for expressing constraint 
problems. 

Several languages and systems for constraint programming are nowadays avail- 
able cither as research and commercial packages. Some of them are in the form 
of frameworks and libraries. As an example, in ECL l PS e ( |Eclipse \ or SICS- 
TUS l|Sicstus \ a traditional programming language such as PROLOG is enhanced by 
means of libraries and specific constructs for specifying constraints, which are then 
solved by highly optimized algorithms. The 1LOG Optimization suite (LLOG-98 1998) 
provides instead libraries for expressing constraints callable by host general-purpose 
programming languages like C++. 

Specification languages natively developed for constraint modelling and program- 
ming are also available, either commercially like OPL ( |Van Hentenryck 1999| ) and 
AMPL l|Fourer et al. 1993}) or as research prototypes, like ESRA IjFlener et al. 2 004 ) , 
all of them offering an ad-hoc syntax for problem specifications. Similarly to NP-Alg 
and COnSql, they support a clear distinction between the data and the problem 
description level, but differently from them, NP-Alg and COnSql use standard and 
well-known languages for specifying problem specifications, that are considered just 
like queries over a relational database representing the input instance. We believe 
that this feature allows for a wider diffusion of the declarative constraint modelling 
paradigm in industrial environments, permitting a very strong integration with the 
information system of the enterprise. Conversely, the other systems usually get in- 
put data from text files in ad-hoc formats, and additional machinery is needed to 
build such files from the content of a relational database, and for storing problem 
solutions. Even if some of them have plug-ins that can be used to make connections 
to databases, e.g., (LLOG-DBLLNK 1999), data are always processed outside the 
DBMS, hence leading to a potential lack of data integrity. 

Several query languages capable of capturing the complexity class NP have been 
shown in the literature. As an example, in ( |Kolaitis and P apadim itriou 199l| ) an 
extension of datalog (the well-known recursive query language (|Ullman 1988|H al- 
lowing negation is proved to have such a property. Another extension of datalog 
capturing NP, without negation but with a form of non-determinism, is proposed 
in QCadoli and Palopoli I998| ). Other rule-based languages with different semantics 
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have also been proposed: Smodels (|Simons et al. 2 002) which relies on stable mod- 
els semantics, and dlv IjLeone et al. |l which is based on answer set programming. 
They also are based on negation and recursion. On the other hand, NP-Alg captures 
NP without recursion. Actually, recursion can be simulated by non-determinism, 
and it is possible to write, e.g., the transitive closure query in NP-Alg. Being non- 
recursive, NP-Alg is more similar to plain existential second order logic. Never- 
theless, it retains the functional character of relational algebra, which sometimes 
makes it easier (with respect to rule-based languages) to specify a problem. 

For what concerns COnSql, we believe it is a clear step towards a language for 
both declarative constraint modelling and complex queries to relational databases, 
which relics on standard and well-known technologies. Currently, the most adopted 
solution for evaluating complex queries over a relational database is to embed SQL 
into a general-purpose programming language, like Java or C++, thus by processing 
stored data and intermediate results outside the database. COnSql instead has been 
designed for being implemented inside the DBMS, so guaranteeing all transactional 
properties to the query evaluation process. 

As for the proposed implementation of COnSql SIMULATOR, it is conceived to 
be based on a purely declarative language and to be ready to use, i.e., it does 
not require any additional code to be written by the user. Other systems for local 
search do, however, exist, either in forms of declarative languages for modelling in 
a concise way local search algorithms (cf., e.g., ( jMichel and Van Hentenryck 2000| 
|Van Hentenryck and Michel 2003] )) or, alternatively, in forms of libraries or frame- 
works (cf., e.g., Local++ IjSchaerf et al. 2000|l '). hence providing algorithms that 
rely on additional application-specific code provided by the user. COnSql SIMULA- 
TOR is different from such systems in that it provides the user with the ability of 
modelling an optimization problem by means of a language, i.e., COnSql, that is 
completely unaware of the particular solving technology used. It is responsibility 
of the engine to provide the local search solver with all the information needed 
to explore the search space (e.g., description of neighborhoods, moves, etc.). This 
choice is currently made starting from the types of the guessed tables defined in the 
specification, and future work has to be done in order to better exploit the different 
alternatives, as discusses at the end of Subsection 15.31 

COnSql simulator will be released as free and potentially open source soft- 
ware, thus allowing the system to receive improvements and extensions from the 
community. 



Appendix A Combined complexity of NP-Alg 

In this section we prove Theorem l4.2l The proof consists in reducing an NE-complete 
problem, Succint 3-coloring | |Kolaitis and Papadimitriou 199l| ), i.e., the "succinct 
version" of the graph 3-coloring problem, into an NP-Alg query. It is worth noting 
that the resulting NP-Alg query is not uniform with respect to the problem instance, 
but this is exactly what the definition of combined complexity (as opposed to data 
complexity) states. The Succint 3-coloring problem is defined as follows: 

Definition Appendix A.l (The Succint 3-coloring problem) 
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Nodes of the input graph are elements of {0, 1}™, and, instead of an explicitly given 
EDGES relation, there is a boolean circuit with In inputs and one output, such 
that the value output by the circuit is 1 if and only if the inputs are two n-tuples 
that encode a pair of nodes connected by an edge. A boolean circuit is a finite set 
of triples {(a*, &», Ci),i = 1, . . . , k}, where a, G {OR, AND, NOT, IN} is the kind 
of the gate, and bi,Ci < i are the inputs of the gate (hence, the whole circuit is 
acyclic), unless the gate is an input gate (a, = IN), in which case, say, 6j = Ci = 0. 
For NOT gates, hi — Ci. Given values in {0, 1} for the input gates, we can compute 
the values of all gates one by one by starting from the first one. The value of the 
circuit is the value of the last gate. Finally, the Succint 3-coloring problem is the 
following: Given a boolean circuit with 2n inputs and one output, is the graph thus 
presented 3-colorable? 

The Succint 3-coloring problem is proven to be NE-complete in the same paper 
QKolaitis and Pa padimi tnou 1991| ) . 

Reduction of Succint 3-coloring into an NP-Alg query. Given an input boolean 
circuit G = {gi = (a,i,bi,Ci) | 1 < i < k} with 2n inputs and one output, we 
construct the NP-Alg query ip that specifies the Succint 3-coloring problem (on the 
graph represented by circuit G) as follows. 

As for the set Q of guessed relations, we declare a relation G^ for every gate 
i, i.e., for every triple Oj = (flj, bi, c,), (1 < i < k). Moreover, we declare in Q three 
more relations, COL^ , COL% , COL$, encoding the partition of the nodes into 
3 groups, analogously to the specification for A:-coloring given in Section ITT7T1 So, 
the Guess part of the NP-Alg query being built is the following: 



Intuitively, relations Gi will contain all tuples (X, Y), with X = {X\, . . . , X n ), and 
Y = (Yi, . . . ,Y n ) (i.e., binary encodings of the nodes X and Y) for all pairs of 
nodes X and Y that make the output of the i-th gate 1. 
The expression for FAIL is of the following kind: 

FAIL = FAIL.CIRCUIT U FA IL_PA R TITION U FAIL.COLORING. 

The first subexpression evaluates to the empty relation if and only if the guessed 
extension for the Gi relations correctly encodes the circuit, while the second and 
the third ones evaluate to the empty relation if and only if relations COL\, COL2, 
COL3, are a partition of the graph nodes and a correct coloring of the graph (we 
omit their definitions, since they are very similar to those presented in Section [3.1fl . 

The expression for FAIL_CIRCUIT contains in turn one of the following subex- 
pressions FAIL_Gi, 1 < i < k, for every gate i, according to its type a^. In particular: 



Guess G[ 2n \ G% n) , COL ( { l) , COL™ , COL^; 



• If <Zj 



AND, then FAIL.Gi = G, A G bi n G c 



• If Oj 



• If Oi 



OR, then FAIL_Gi = d A G hi U G Ci ; 
NOT, then FAIL_Gi = d A [DOM 2 £ - G bi ] ; 
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• If Oj = IN, then FAIL_d = G L A cr (DOM 2 ,™), assuming that the i-th gate 

$j=i 

(of type IN) is the j-th input of the circuit. 
In the above definition, we used the relation DOM i, defined as: 

DOM m = a (DOM) 

$l^AND A 
$l#Ofl A 
$l^NOT A 
$1^IN 

that will contain at most the two tuples (0) and (1) (since DOM would also con- 
tain constants for the gate types). Thus, the expression for FAIL-CIRCUIT is the 
following: 

k 

FAIL_CIRCUIT = (J FAIL_G l . 

i=l 

It remains to prove that the expression for FAIL_CIRCUIT evaluates to the empty 
relation if and only if guessed relations G\ , . . . , Gk correctly encode the boolean 
circuit representing the input graph, i.e., if and only if for all i, relation Gi contains 
exactly all 2n-tuples (encoding pairs of nodes given as input to the circuit) that 
make the output of the i-th gate 1. This is what the following lemma claims. 

Lemma Appendix A.l 

Let G = {gi = (di,bi,Ci) | 1 < i < k} be a boolean circuit encoding a graph, and 
let ip be the NP-Alg query built as described above. An extension for guessed tables 
G[ 2n \ . . . , G^ 2 "" 1 exists such that the expression for FAIL-CIRCUIT evaluates to 
the empty relation. Moreover, for such an extension, each d contains exactly all 
2n-tuples (X\, . . . , X n , Y\, . . . , Y n ) that make the output of the -i-th gate 1. As a 
consequence, the extension for Gk contains all 2n-tuples that encode pairs of nodes 
linked by an edge. 

Proof 

We first show that, if extensions for G\, . . . Gk in the NP-Alg query i(j exist that 
make the expression for FAIL-CIRCUIT evaluate to the empty relation (by making 
all the expressions for FAIL_Gi evaluate to the empty relation), then, for every input 
{X\, . . . , X n , Y\, . . . , Y n } to the circuit, each gate i (1 < i < k) outputs 1 if and only 
if the 2n-tuple (X±, . . . , X n , Yi, . . . , Y n ) belongs to the corresponding Gi. Secondly, 
we show that such an extension indeed exists. The proof of the first point is by 
induction on the index i: 

z = l: Gate g\ is, by construction, of type IN (i.e., a\ = IN). Let us assume that 
gi is the j-th input to the circuit, i.e., its output is 1 if and only if the j-th input 
to the circuit is 1. As it can be observed from the definition of FAIL_G\, since 
by hypothesis it evaluates to the empty relation, G\ contains all 2n-tuples that 
have 1 as the j-th component. 

% > 1 : Let us assume that the lemma holds for all %' such that 1 < i' < i, and let us 
consider the i-th gate (of type a, £ {IN, AND, OR, NOT}) and the extension for 
the corresponding guessed relation d. Since, by hypothesis, FAIL.Gi evaluates 
to the empty relation, it can be easily observed by its definition that: 
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If dj = IN, assuming that gi is the j-th input to the circuit, Gi contains, 
by construction, all 2n-tuples that have 1 as the j-th. component; 
If di — AND, it follows by induction that G^ and G Ci contain exactly 
those tuples that make the output of, respectively, gates gb t e g Ci 1. By 
construction, the extension for Gi contains exactly those tuples that belong 
to both Gbi and G Ci . 

If a L = OR an analogous argument holds, showing that Gi contains exactly 
those tuples that belong to G^ or to G Ci . 

If a% = NOT, it follows by induction that (in this case bi — c,) contains 
exactly those tuples that make the output of gate 1. By construction, 
the extension for Gi contains exactly those tuples in DOMq" that do not 
belong to Gb i . 



As for the second point of the proof, it is easy to show that an extension for 
G\, . . . Gk that makes all the expressions for FAIL_Gi evaluate to the empty relation 
indeed exists. The key observation is that expressions for FAIL_Gi essentially define 
which tuples must belong to each Gi (more precisely, each FAIL_Gi evaluates to 
the empty relation if and only if Gi contains exactly the tuples that belong to the 
relational algebra expression on the right of the "A" symbol), and that the Guess 
part of the query generates all possible extensions of those relations with elements 
in DOM D DOM oi. □ 

Lemma |Appcndix A.l| claims that an extension for G\, . . . Gk in query ip that makes 
the expression for FAIL-CIRCUIT evaluate to the empty relation exists, and is 
the one that correctly models the boolean circuit representing the input graph. It 
remains to prove that the whole query ip is such that FAILO% if and only if the 
input graph is 3-colorable. This is claimed by the following result: 

Lemma Appendix A. 2 

Let G = {gi = (<ij, Cj) | 1 < i < k} be a boolean circuit encoding a graph, and let 
tp be the NP-Alg query built as described above. The expression for FAIL in ip eval- 
uates to the empty relation for a given extension of G\,...Gk, GOL\, COL 2 , COL 3 
if and only if G\,. . . Gk correctly encode the circuit G and COL\, COL 2 , COL 3 rep- 
resent a valid coloring of the input graph. Thus, FAIL<>$ if and only if the input 
graph is 3-colorable. 

Proof 

The boolean circuit G is translated into k guessed relations G\, . . . Gk- The correct- 
ness of the translation is claimed by Lemma [Appendix A.l| Moreover, the Guess 
part of query ip generates also all possible extensions for three more guessed re- 
lations, i.e., COLi, GOLi, COL 3 . As discussed in Section |3~T1 the expression for 
FAIL_PARTITION U FAIL.COL ORING evaluates to the empty relation if and only 
if COLi, COL 2 , GOL 3 define a valid coloring of the graph. □ 



From previous lemmas, it follows the proof of Thcorem l4.2l that states the combined 
complexity of NP-Alg: 
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Proof of Theorem \4-S\ 

Immediate, from Lemma | Appendix A.2| and from the NE-completeness of the Suc- 
cint 3-coloring problem {Kolai tls and Papadimitriou 19 91). □ 
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